-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[query] Use EncodedLiteral instead of Literal from python to scala #13814
[query] Use EncodedLiteral instead of Literal from python to scala #13814
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v. nice
hail/python/hail/expr/types.py
Outdated
@@ -1754,6 +1851,9 @@ def call_allele_pair(i): | |||
|
|||
return genetics.Call(alleles, phased) | |||
|
|||
def _convert_to_encoding(self, byte_writer, value): | |||
pass # TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh yeah this one is a bit complex
hail/python/test/hail/test_ir.py
Outdated
) | ||
def test_literal_encoding(value): | ||
lit = hl.literal(value) | ||
round_trip = lit.dtype._from_encoding(lit.dtype._to_encoding(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love me a round-trip test
ac19f23
to
9cb4764
Compare
Should be mostly if not entirely passing now.
hail/python/hail/expr/types.py
Outdated
byte_writer.write_int64(dim) | ||
|
||
if self.element_type in _numeric_types: | ||
byte_writer.write_bytes(value.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting the element ordering right here makes me a little nervous. Should maybe have some more rigorous tests.
interval = hl.Interval(start=start, end=end, includes_start=True, includes_end=False) | ||
_assert_encoding_roundtrip(start) | ||
_assert_encoding_roundtrip(end) | ||
_assert_encoding_roundtrip(interval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each of these three tests is slightly different, but they all ultimately call the same method. What's the argument against just parameterizing _assert_encoding_roundtrip
itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that at first, and looks like I can do this with Call
, but I can't instantiate a Locus
before init
(because it tries to load a reference genome) so I can't include a Locus
in the parametrize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahah right
hail/python/test/hail/test_ir.py
Outdated
{"foo", "bar", "baz"}, | ||
{"a": 1, "b": 2}, | ||
np.array([[1, 2], [3, 4], [5, 6]]), | ||
np.array([[1, 2], [3, 4], [5, 6]]).T, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pursuant to your comment above can we add:
- Test for 0d np.array
- Test for 1d np.array
- Test for 2d np.array
- Test for 4d np.array
- Transposition of all of the above
byte_writer.write_bytes(value.data) | ||
else: | ||
for elem in np.nditer(value, order='F'): | ||
self.element_type._convert_to_encoding(byte_writer, elem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel good about this now. You're using ENDArrayColumnMajor
and the column ordering is Fortran, so ti all checks out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bugs about getting non integral values is because pandas is giving you a pandas NA.
hail/python/hail/expr/types.py
Outdated
while i < length: | ||
missing_byte = 0 | ||
for j in range(min(8, length - i)): | ||
if value[i + j] is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if value[i + j] is None: | |
if value[i + j] in (None, pd.NA): |
hail/python/hail/expr/types.py
Outdated
while i < length: | ||
missing_byte = 0 | ||
for j in range(min(8, length - i)): | ||
if value[keys[i + j]] is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here for missing values from pandas.
6756fff
to
b5719f9
Compare
dismissing as theres been a non-trivial change
Hmm. Seems like the ArraySorter still isn't quite right? I also saw some issues with sockets timing out but I don't know what to make of those yet.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something still seems off here b/c of the test failures
|
||
val sorter = new ArraySorter(EmitRegion(cb.emb, region), ab) | ||
def lessThan(cb: EmitCodeBuilder, region: Value[Region], l: Value[_], r: Value[_]): Value[Boolean] = { | ||
val lk = cb.memoize(sct.loadToSValue(cb, l).asBaseStruct.loadField(cb, 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrick-schultz is a missing key value handled correctly by this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See failure here: https://batch.hail.is/batches/8064082/jobs/83
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how we're generating these from Python, how could a key value ever be missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything obviously wrong. I can try checking out the branch and debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Patrick!
Hmm. What's special about The other issue seems like it might be simpler, we're just setting a bad type on a field:
|
Ah we have the same ordering problem with sets as we have with dicts |
We sure do |
CHANGELOG: Pipelines that are memory-bound by copious use of
hl.literal
, such asvds.filter_intervals
, require substantially less memory.Closes #13757